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ABSTRACT 

A new fast Bayesian approach is introduced for the detection of discrete objects immersed in 
a diffuse background. This new method, called PowellSnakes, speeds up traditional Bayesian 
techniques by; i) replacing the standard form of the likelihood for the parameters character- 
izing the discrete objects by an alternative exact form that is much quicker to evaluate; ii) 
using a simultaneous multiple minimization code based on Powell's direction set algorithm 
to locate rapidly the local maxima in the posterior; and iii) deciding whether each located 
posterior peak corresponds to a real object by performing a Bayesian model selection using 
an approximate evidence value based on a local Gaussian approximation to the peak. The 
construction of this Gaussian approximation also provides the covariance matrix of the uncer- 
tainties in the derived parameter values for the object in question. This new approach provides 
a speed up in performance by a factor of 'hundreds' as compared to existing Bayesian source 
extraction methods that use MCMC to explore the parameter space, such as that presented 
by Hobson & McLachlan (i2003:) . The method can be implemented in either real or Fourier 
space. In the case of objects embedded in a homogeneous random field, working in Fourier 
space provides a further speed up that takes advantage of the fact that the correlation ma- 
trix of the background is circulant. We illustrate the capabilities of the method by applying 
to some simplified toy models. Furthermore PowellSnakes has the advantage of consistently 
defining the threshold for acceptance/rejection based on priors which cannot be said of the 
frequentist methods. We present here the first implementation of this technique (Version-I). 
Further improvements to this implementation are currently under investigation and will be 
published shortly. The application of the method to realistic simulated Planck observations 
will be presented in a forthcoming publication. 

Key words: Cosmology: observations - methods: data analysis - cosmic microwave back- 
ground 



1 INTRODUCTION 

The detection and characterisation of discrete objects is a generic problem in many areas of astrophysics and cosmology. Indeed, one of the 
major challenges in such analyses is to separate a localised signal from a diffuse background. It is most common to face this problem in the 
analysis of two dimensional images, where one wishes to detect discrete objects in a diffuse background. 

When performing this task it is often assumed that the background is smoothly varying and has a characteristic length-scale much larger 
than the scale of the discrete objects being sought. For example, SExtractor (Bertin & Arnouts l(l996)) approximates the background emission 
by a low-order polynomial, which is subtracted from the image. Object detection is then performed by finding sets of connected pixels above 
some given threshold. Such methods run into problems, however, when the diffuse background varies on length scales and with amplitudes 
similar to those of the discrete objects of interest. Moreover, difficulties also arise when the rms level of instrumental noise is comparable to, 
or somewhat larger than, the amplitude of the localised signal one is seeking. 

A specific example illustrating the above difficulties is provided by high-resolution observations of the cosmic microwave background 
(CMB). In addition to the CMB emission, which varies on a characteristic scale of order ~ 10 arcmin, one is often interested in detecting 
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emission from discrete objects suchi as extragalactic 'point' (i.e. beam-siiaped) sources or the Sunyaev-ZeI'dovich (SZ) effect in galaxy 
clusters, whicfi iiave characteristic scales similar to that of the primordial CMB emission. Moreover, the rms of the instrumental noise in 
CMB observations can be greater than the amplitude of the discrete sources. In such cases, it is not surprising that straightforward methods, 
such as that outlined above, often fail to detect the localised objects. 

The standard approach for dealing with such difficulties is to apply a linear filter to the original image d{x) and instead analyse 
the resulting filtered field 



Suppose one is interested in detecting objects with some given spatial template t[x) (normalised for convenience to unit peak amplitude). If 
the original image contains A^obj objects at positions Xi with amplitudes Ai, together with contributions from other astrophysical components 
and instrumental noise, then 



where a{x) is the signal of interest and n{x) is the generalised background 'noise', defined as all contributions to the image aside from the 
discrete objects. It is straightforward to design an optimal filter function ^l){x) such that the filtered field 0} has the following properties: (i) 
df{Xk) is an unbiassed estimator of Au (ii) the variance of the filt ered n oise field nf{x) is minimised; The corresponding function ip{x) is 
the standard matched filter (see, for example, Haehnelt & Tegmark Il99dt)) One may consider the filtering process as 'optimally boosting' (in 
a linear sense) the signal from discrete objects, with a given spatial template, and simultaneously suppressing emission from the background. 

An additional subtlety in most practical applications is that the set of objects are not all identical, but one can repeat the filtering process 
with filter functions optimised for different spatial templates to obtain several filtered fields, each of which will optimally boost objects with 
that template. In the case of the SZ effect, for example, one might assume that the functional form of the template is the same for all clusters, 
but that the 'core radius' differs from one cluster to another. The functiona l form of the filter function is then the same in each case, and 
one can repeat the filtering for a number of different scales (Herranz et al. ( l2002j) ). Moreover, one can trivially extend the linear filtering 
approach to the simultaneous analysis of a set of images. Once again, the SZ effect provides a good example. Owing to the distinctive 
frequency dependence of the thermal SZ effect, it is better to use the maps at all the observed frequencies simultaneously when attempting 
to detect and characterise thermal SZ clusters hidden in the emission from other astrophysical components (Herranz et al. j2002t|) ). 

The approaches outlined above have been shown to produce good results, but the filtering process is only optimal among the rather 
limited class of linear filters a nd is l ogically separated from the subsequent object detection step performed on the filtered map(s). As a 
result, Hobson & McLachlan hereinafter HM03) introduced a Bayesian approach to the detection and characterisation of discrete 

objects in a diffuse background. As in the filtering techniques, the method assumed a parameterised form for the objects of interest, but the 
optimal values of these parameters, and their associated errors, were obtained in a single step by evaluating their full posterior distribution. 
If available, one could also place physical priors on the parameters defining an object and on the number of objects present. Although this 
approach represents the theoretically-optimal method for performing parametrised object detection, its implementation was performed using 
Monte-Carlo Markov chain (MCMC) sampling from the posterior which was extremely computationally intensive. Although cons iderab le 
progress has recently been made in increasing the efficiency of sampling-based Bayesian object detection methods (Feroz & Hobson 
such approaches are still substantially slower than simple linear filtering methods. Therefore, in this paper, we explore a new, fast method for 
performing Bayesian object detection in which sampling is replaced by multiple local maximisation of the posterior, and the evaluation of 
errors and Bayesian evidence values is performed by making a Gaussian approximation to the posterior at each peak. This approach yields 
a speed-up over sampling-based methods of many 100s, making the computational complexity of the approach comparable to that of linear 
filtering methods. 

The outline of the paper is as follows. In section[2]we provide a brief outline of the essential Bayesian logical framework which supports 
the employed methodology. The previously defined entities (see section |2j are then characterized in terms of the source detection problem 
in section [3] A summary of the strategy to fulfil the desire goals is then suggested in section |4] In section [5] we develop the first step of the 
proposed algorithm: The maximisation of the posterior distribution and we give account of the errors bars on the estimated parameters of 
the putative source under study. A two step approach split between Fourier and real space is indicated. The second phase of the detection 
algorithm: The Bayesian validation procedure which we apply on the previously found likelihood maximum is explained in section |6] A 
prior is suggested and a upper bound on the quality of the detection is advanced. We end this section by considering a method for correcting 
possible systematic deviations on the evidence evaluation. In section |7] we give a detailed study on the effects that develop when adding 
correlation into the diffuse background that produces the substructure in which the sources are imbedded. The results from a series of 
simulations covering several typical scenarios are presented in section [8] and we close indicating our conclusions and possible directions for 
further work in section |9] 



2 BAYESIAN INFERENCE 

Bayesian inference methods provide a consistent approach to the estimation of a set parameters in a model (or hypothesis) H for the data 
d. Bayes' theorem states that 




(1) 




(2) 



fc=i 



Pr{@\d,H) 



Pv{d\ &,H)Pt{&\H) 
Pr{d\H) 



(3) 



© 2007 RAS, MNRAS 000.[Tlt30l 



Fast Bayesian object detection 3 



where Pr(0|rf, H) = P{@) is the posterior probabihty distribution of the parameters, Pr(rf|0, H) = L{@) is the likehhood, Pr(0|//) = 
7r(0) is the prior, and 'Px{d\H) = E is the Bayesian evidence. 

In parameter estimation, the normahsing evidence factor is usually ignored, since it is independent of the parameters 0. This (un- 
normalised) posterior constitutes the complete Bayesian inference of the parameter values. Inferences are usually obtained either by taking 
samples from the (unnormalised) posterior using MCMC methods, or by locating its maximum (or maxima) and approximating the shape 
around the peak(s) by a multivariate Gaussian. 

In contrast to parameter estimation problems, in model selection the evidence takes the central role and is simply the factor required to 
normalize the posterior over 0; 

2 = J L(0)7r(0)d°0, (4) 

where D is the dimensionality of the parameter space. As the average of the likelihood over the prior, the evidence is larger for a model if 
more of its parameter space is likely and smaller for a model with large areas in its parameter space having low likelihood values, even if 
the likelihood function is very highly peaked. Thus, the evidence automatically implements Occam's razor: a simpler theory with compact 
parameter space will have a larger evidence than a more complicated one, unless the latter is significantly better at explaining the data. The 
question of model selection between two models Ho and Hi can then be decided by comparing their respective posterior probabilities given 
the observed data set d, as follows 

PrjHild) ^ Pr(rf|Hi)Pr(jj-i) ^ Zi Pr(gi) 
PT{Ho\d) Pr{d\Ho)PT{Ho) ZoPriHo)' 

where Pt{Hi)/ Pr{Ho) is the a priori probability ratio for the two models, which can often be set to unity but occasionally requires further 
consideration. Unfortunately, evaluation of the multidimensional integral (O is a challenging numerical task. Nonetheless, a fast approximate 
method for evidence evaluati on is to model the posterior as a multivariate Gaussian centred at its peak(s) and apply the Laplace formula (see 
e.g. Hobson, Bridle & Lahav ( l2002h . thereinafter HBLOl). 



3 BAYESIAN OBJECT DETECTION 

A Bayesian approach to detecting and characterizing discrete objects hidden in some background noise was first presented in an astronomical 
context by HM03, and our general framework follows this closely. For brevity, we will consider our data vector d to denote the pixel values in 
a single patch in which we wish to search for discrete objects, although d could equally well represent the Fourier coefficients of the image, 
or coefficients in some other basis. 



3.1 Discrete objects in a background 

Let us suppose that we are interested in detecting and characterizing some set of (two-dimensional) discrete objects, each of which is 
described by a template r(x;a). This template is defined in terms of a set of parameters a that might typically denote (collectively) the 
position {X, Y) of the object, its amplitude A and some measure R of its spatial extent. A particular example is the circularly-symmetric 
Gaussian-shaped object defined by 



r(x;a) = ylexp 



(x-Xf + iy-Yf 



(6) 



so that a = {X, Y, A, R}. In what follows, we will consider only circularly-symmetric objects, but our general approach accommodates the 
templates that are, for example, elongated in one direction, at the expense of increasing the dimensionality of the parameter spaceQ In the 
analysis of single-frequency maps, most authors take the template r(x;a) to be the (pixelised) intrinsic shape of the object convolved with 
the beam profile. For multi-frequency data, however, it is more natural to treat the intrinsic object shape and the beam profiles separately. 
If ^obj objects are present in the map and the contribution of each object to the data is additive, we may write 



obj 



d = n+Y,<ak), (7) 



fc=i 



where s(ak) denotes the contribution to the data from the fcth discrete object and n denotes the generalised 'noise' contribution to the data 
from other 'background' emission and instrumental noise. Clearly, we wish to use the datarf to place constraints on the values of the unknown 
parameters A^obj and Uk {k = 1, . . . , A'obj )■ 



^ Such an elongation can be caused by asymmetry of the beam, source or both. If our model assumes a circularly symmetric source but an asymmetric beam, 
the elongation will be constant across the map, but not if the sources are intrinsically asymmetric. In the latter case, one thus has to introduce the source 
position angle as a parameter, in addition to the elongation. 

2 In its early stages PowellSnakes treated a point source convolved with the beam as a single parameterized template. In the cuiTent version these are separate 
entities. 
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3.2 Defining the posterior distribution 

As discussed in HM03, in analysing the data the Bayesian purist would attempt to infer simultaneously the full set of parameters = 
{Nohj,(ii,a2, ■ . ■ ,ajv„^j). The crucial complication inherent to this approach is that the length of the parameter vector © is variable, since 
it depends on the unknown value A'^obj ■ Some sampling-based approaches are able to move between spaces of different dimensionality, and 
such techniques were investigated in HM03. 

An alternative approach, also discussed by HM03, is simply to set A'obj = 1- In other words, the model for the data consists of just 
a single object and so the full parameter space under consideration is a = {X, Y, A, R}, which is fixed and only 4-dimensional. Although 
we have fixed A'obj = 1, it is important to understand that this does not restrict us to detecting just a single object in the map. Indeed, by 
modelling the data in this way, we would expect the posterior distribution to possess numerous local maxima in the 4-dimensional parameter 
space, each corresponding to the location in this space of one of the objects present in the image. HM03 show this vastly simplified approach 
is indeed reliable when the objects of interest are spatially well-separated, and we adopt this method here. 



3.3 Likelihood 

In this case, if the background 'noise' n is a statistically homogeneous Gaussian random field with covariance matrix A' = {nn^), then the 
likelihood function takes the form 

e^p{-l[d-s{a)]^N-^\d-s{a)]] 

Moreover, if the background is just independent pixel noise, then A' = a^I, where a is the noise rms. 
In the general case, the log-likelihood thus takes the form 

InL(a) = c- i [d-s{a)Y [d-s(a)] (9) 

where c is an unimportant constant. The first innovation in our new method is to re-cast the log-likelihood in such a way that the computational 
cost of evaluating it is considerably reduced. This is achieved by instead writing the log-likelihood as 

Ini(a) = c' - \s{afN'^s{a.) +d'^N'^s{a), (10) 

where c = c — ^d'^N^^d is again independent of the parameters a. The advantage of this formulation is that the part of the log-likelihood 
dependent on the parameters a consists only of products involving the data and the signal. Since the signal from a discrete source with 
parameters a is only (significantly) non-zero in a limited region centred on its putative position one need only calculate the quadratic forms 
in over a very limited number of pixels, whereas as the quadratic form in the standard version ^ must be calculated over all the pixels 
in the map. 

In the case of independent pixel noise, the covariance matrix A' in l llOt is diagonal, and so the quadratic forms can be calculated very 
rapidly. For correlated noise, however. A' is no longer diagonal and so evaluating the necessary elements of its inverse can be costly for a 
large map. Nonetheless, if the background is a statistically homogeneous random field A'"^ can be computed in Fourier space, where it is 
diagonal. As a result, we perform our analysis of sufficiently small patches of sky that the assumption of statistical homogeneity is reasonable. 
An alternative procedure would be to transform to a set of basis functions, such as wavelets, in which the data are readily compress, hence 
reducing the effective dimensionality of the problem; we will not, however, pursue this route. 



3.4 Prior 

Having determined the likelihood function, it remains only to assign a prior on the parameters a = {X, Y, A, R). Although not a formal 
requirement of our method, the simplest choice is to assume the prior is separable, so that 

7r(a) = tt{X)tt{Y)tt{A)tv{R). (11) 

Moreover, we will assume that the priors on X and Y are uniform within the range of the image, and that priors on A and R are the uniform 
distributions within some ranges [Amin, ^max] and [-Rmin, J?max] respectively. This is true for our first implementation of PowellSnakes 
(version I). Our general approach can, in fact, accommodate more general priors. Indeed, the assumption of uniform priors on A and R is 
not a good one for most astrophysical problems. Typically one expects the number of discrete sources to decrease with amplitude, and their 
angular sizes are unlikely to be uniformly distributed. The best prior to adopt for A is one derived from existing source counts. These are 
typically power laws over a wide range (although care must be taken at the ends of the range to obtain a properly normalised prior). In our 
approach presented below, however, we will assume the log-posterior is at most quadratic in the source amplitude A, which restricts Tr{A) to 
be of uniform, exponential or Gaussian form. (The second implementation (version II) of PowellSnakes currently under development adopts 
a Power law.) Turning to Tr{R), existing knowledge of the angular sizes of the objects sought can be use to construct an appropriate prior, 
and no restriction is placed by our method on this. Of course, in the case of point sources, the overall template is simply the beam profile, the 
angular size of which is (usually) known in advance. [f| 



Ongoing work on an improved implementation of PowellSnakes (Version II) uses the following priors: uniform priors on the position, X and Y, and scale, 
R, while the brightness, source amplitude A, is modelled by a Power-law: N{s) oc s~l^ with (3 in [2, 3), where s is the source flux. The range for parameter 
R is taken from a parameter file. 
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3.5 Optimal patch size. The patch border 

The patches are divided into two different areas: A central area where the detection takes place and a surrounding border. The border is used 
to prevent the sources from being truncated by the edges of the patch since this usually impacts severely on the process of their detection 
and characterization. In order to avoid truncation, the border size must be at least half the largest source radial size. When enforcing this rule 
we are assuring that a source profile which has been cut by the patch edge is left undisturbed at least on one of its neighbour patches. When 
defining an optimal patch size several somewhat opposing criteria must be taken into account: 

• The assumption of statistical homogeneity which always favours small patches 

• Execution speed which tends to exhibit a bias towards somewhat larger patches. 

- The greater the number of patches the greater the number of border overlaps which occur. This increases the number of pixels 
processed more than once. 

- There is a non-negligible amount of time preparing a new patch for detection. 

- The pre-filtering stage which involves an FFT requires a power of 2 array size in order to be efficient 

Our work shows that the optimal patch size is always the one which minimises the effects of the in-homogeneities of the background. 
However, (128 x 128) pixels patches on the NSide=1024 Healpix maps (~ 7.33° x 7.33°) and (256 x 256) pixels patches on NSide=2048 
Healpix maps (~ 7.33° x 7.33°) are always good choices in terms of being a good balance between statistical homogeneity and execution 
speed. 



4 OBJECT DETECTION STRATEGY 

Once the likelihood and prior have been defined, the problem of object identification and characterization reduces simply to exploring the 
resulting posterior distribution 

In P(a) = In L(a) + In 7r(a), (12) 

where we have omitted an unim portan t additive constant term on the right-hand side. Rather than using MCMC methods, as advocated 
by HM03 and Feroz & Hobson l l2007l) . here instead we locate the local maxima of the posterior in the 4-dimensional parameter space 
a = {X, Y, A, R} and perform a Gaussian approximation about each peak. The former provides the estimates of the object parameters 
associated with that posterior peak and, as outlined below, the latter allows one to assign uncertainties to the derived parameter values. The 
Gaussian approximation also allows one to estimate the Bayesian evidence associated with each posterior peak. The Bayesian evidence is 
used to decide whether the detected peak corresponds to a real object or is just a 'conspiracy' of the background noise field (see Section[6]l. 

Such an approach was also advocated by HM03, in which a simulated annealing downhill simplex minimiser was used in an iterative 
object detection scheme. At each iteration, the minimiser located the global maximum of the posterior and an object with the optimal 
parameter values was subtracted from the map before commencing the next iteration. The process was repeated until the first rejected 
detection. 

Here we adopt a different strategy that obviates the need to attempt to locate the global maximum at each iteration, since this is very 
computationally expensive. Instead, our approach consists of launching a series of simple downhill minimisations. The choice of starting 
point for each minimisation, the minimiser employed and the number of minimisations performed are discussed in detail in Section |5] The 
end-point of each minimisation launched will be a local maximum in the posterior, giving the optimal parameter values for some putative 
detected object. A Gaussian approximation to the posterior is constructed about the peak and the detection is either accepted or rejected based 
on an evidence criterion (see Section[6]l. If the detection is accepted, then an object with the optimal parameter values is subtracted from 
the map before the next downhill minimisation is launched. Although our method does leave open the possibility of repeatedly detecting the 
same spurious sources, the speed of our algorithm assures that this is not a problem (see Section [Sj. 

It should be noted, however, that the only reason for subtracting accepted objects from the map is to avoid multiple real detections. 
Moreover, when attempting to detect sources at very low signal-to-noise ratios, one might occasionally accept a spurious detection as a real 
source, and its subtraction from the map may damage subsequent downhill minimisations. Ideally, therefore, one should avoid subtracting 
sources altogether. One such approach is simply to accept that real objects may be detected repeatedly, and check for duplications in the 
optimal sets of parameters obtained in each accepted detection (e.g. parameters sets that are similar to well within the derived uncertainties). 
We have not implemented such a method for this paper, but this is under investigation. 

As discussed in Section|5]determining the number of the downhill minimisations to perform is a key part of our method. However once 
we perform these minimizations , we ensure that no real object has been missed by performing a further set of downhill minimisations as 
follows. Since the spatial extent of any object in the data map must be greater than that of the beam, we divide the map into patches of area 
equal to the beam size. For each such patch, we then launch a downhill minimiser with initial values of X and Y equal to the coordinates 
of the centre of the patch, and with A and R values equal to the midpoints of their respective prior ranges . This is the case for our first 
implementation of PowellSnakes (version I, for white noise only). This approach is unlikely to miss any remaining peaks in the posterior 
associated with real objects, provided the number density of sources is sufficiently low that they are spatially well-separated. Clearly, such an 
approach will perform less well if two or more sources (partially) overlap, in which case a single source of large spatial extent may be fitted 
to them. 

In the earlier stages of PowellSnakes implementation we performed the analysis completely in real space, rather than pre-filtering the 
map. However this procedure slowed down the execution of the algorithm. The evaluation of the objective function for each set of parameters 
is a very expensive operation which unfortunately lies deep in the inner loop of the code, ending up being called several hundreds of times 
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for each ''snake". Even a small optimization of its execution has a gigantic impact on the overall performance of the code. We initialize the 
Powell minimizer, (ie the tail of each of the "snakes"), with the central points of the grid defined in subsection 15.21 and the midpoints of 
the [A,R] parameters. The first step of the minimization process is always to search the position sub-space keeping R and A constant. As 
we have to repeat this step each time we start a new "snake", finding a fast way of evaluating the likelihood in all the positional sub-space 
available is very advantageous. This is where the filtering process steps in. Filtering with the matched filter automatically provides us with a 
fast evaluation of the likelihood restricted to the positional subspace. Furthermore it gives a very good estimate of the starting value for the A 
parameter, using the classical formula of the amplitude of a field filtered by a matched filter l ll6b . Next, we need to find out the optimal value 
of R parameter which is no longer considered constant. 

As a final note, one must take care when a source is located near the edge of the map, since it will be partially truncated. This is not, 
however, a severe limitation as one can allow an overlap of at least the large expected size when dividing the full map into small patches; a 
truncate source in one patch will appear intact in a neighbouring one. This strategy does result in a small extra processing overhead, since 
several pixels will be processed more than once, but this is a minor consideration. 



5 MAXIMISING THE POSTERIOR DISTRIBUTION 

In this section, we outline the method employed to locate the multiple local maxima of the posterior distribution. 



5.1 Pre-flltering step 

Let us begin by writing the template ^ as 

r{x-a) ^ At{x-X:,R), (13) 

where A, X and R are, respectively, the amplitude, vector position and size of the object, and t{x; R) is the spatial shape of a unit height 
reference object of size R centred on the origin. Then the signal vector in the log-likelihood l llOt is simply 

s{a) ^ At{X,R), (14) 

where the vector t has components ti = t{xi —X;, R), in which JC; is the position of the ith pixel in the map. Substituting the above expression 
into the log-likelihood, assuming a constant prior and differentiating with respect to A gives 

= t'^{X,R)N-'[d - At{X, R)], (15) 
which on setting equal to zero yields an analytic estimate for the amplitude: 

^^^^''^ = t^X,R)N-h{X,Ry ^'^^ 

Note that, under the assumption of statistical homogeneity, the denominator in the above expression does not depend on the source position 
X, and thus n eed only be calculated once (say for a source at the origin) for any given value of R. Generalising the notation of Savage & 
Oliver l l2006l) . we denote this denominator by a{R) and the numerator in l ll6t by ^{X, R). It is worth noting that the quantity ^ a[R) is 
merely a generalised signal-to-noise ratio for a unit amplitude object integrated over its spatial template. 

More importantly the estimator M6\ is precisely the filtered field produced by a classical matched filter, in which one assumes a value 
for R. Moreover, it is straightforward to show that the corresponding log-likelihood JlOb can be written 

\nL{X,A,R) = c + \a{R)[A{X,R)f. (17) 

Thus, we show here that for a given value of R, peaks in the filtered field correspond precisely to peaks in the log-likelihood considered as a 
function of X. If the objects sought all have the same size R and this is known in advance, then the local maxima of the now 3-dimensional 
posterior in the space [X, Y, A) are all identified by this method. This scenario corresponds to point sources observed by an experiment with 
a known beam profile. However the assumption that the sizes of the objects are known in advance might lead us astray. To show that this is 
indeed the case, consider the following: 

• Suppose two sources are so close they cannot be resolved by the optical device: The intensity profile usually still matches fairly well 
the of beam profile but with a much enlarged radial scale. Taking the radius of the beam profile as an unknown parameter, PowellSnakes will 
still identify them wrongly as a single source, but the total estimated flux will much closely match the true value of the total flux coming 
from both sources. 

• We are assuming a statistical homogeneous background: this is never strictly true, but keeping the patches small enough this is usually a 
good assumption. Nevertheless, small changes in the beam profile across the patch, which in terms of the noise components (CMB, Galactic 
foreground, etc.) may be perfectly ignored, when estimating the sources fluxes might lead us to severely biased values. 

• We are assuming point sources: this limits the range of application of this method. One of the driving motivations behind PowellSnakes 
is the detection of SZ clusters. This implies the capability of multi-frequency operation and a source profile with a well defined shape. This 
source shape template is only useful in real situations if we are to allow different spatial scales. In order to achieve this extra degree of 
freedom we introduced the Radius [7?] parameter. 
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Figure 1. Log-likelihood manifold, equation jlOt using template (6) (A and R dimensions suppressed); white noise background, R parameter = 4.0 pixels; a 
constant has been subtracted 



Whe n the source size R is not known in advance, or the sources have different sizes, most matched filter analyses (see e.g. Herranz et 
al. ( l2005h ) filter the field at several predetermined values of R and combine the results in some way to obtain estimates of the parameters 
{X, Y, A, R) for each identified source. This approach is rather ad hoc, however, and so we will pursue a different strategy based on locating 
the local maxima of the posterior using multiple downhill minimisations. At first sight it might seem tempting to use the result l ll6t to 
reduce the parameter space to the 3 dimensions {X,Y,R), since with knowledge of these parameters one can immediately calculate A. 
Such an approach can, however, lead to numerical instabilities when the denominator a(R) in l |16l ) is very small (which occurs in the low 
signal-to-noise regime) as our numerical tests confirmed. Thus we do not pursue this approach. 

We do, however, make use of the result il6i by first pre-filtering the map assuming a value for R equal to the mean of its prior range. 
The advantage of this pre-filtering stage is that, even in the presence of objects of different sizes, the filter field often has a pronounced local 
maximum near many real objects in case the R parameter range is not too wide and the value of R employed in the filtering process stays 
close to (figure[TJ. 



5.2 2-diinensional Powell minimisation step 

In the next step of our analy sis, we launch A'^ — 100 (see below) downhill minimisations in the two-dimensions {X, Y) using the Powell 
algorithm (Press et al. with starting points chosen as described below. This method is a simple direction-set algorithm that requires 

only function values. The method makes repeated use of a 1-dimensional minimisation algorithm, which in our case is Brent's method; this is 
an interpolation scheme that alternates between parabolic steps and golden sections. In addition, we 'enhance' the basic Brent line minimizer 
algorithm by introducing a 'jump' procedure designed to avoid small local minima (such as those resulting in the posterior from the noise 
background). A description of this 'jump' procedure is given below. The reason for this 2-dimensional minimisation step is simply to obtain 
a good estimate of the {X, 'K)-positions of the sources. As a byproduct, using formula il6\ . we end up obtaining an approximate estimate of 
the putative source amplitude A as well. 

Assuming that sensitivity and experiment resolution are such that we should only expect a very small number of partially resolved 
sources, one may assume that in each patch of beam size area we should not find more than one real peak. The likelihood peaks in the 
positional sub-space (keeping A and R constant) are not expected narrower than the original beam at least when we are dealing with the 
white noise only case. This is so because the likelihood positional surface is nothing but the correlation of the beam shape with, on average, 
another beam shape eventually with a different R. Thus the total area where the correlation will have values significantly different from is 
expected to be larger than the largest area of each one of the original beams. These assumptions naturally define the coarseness one should 
use when exploring the patch, namely, the beam size. As we are assuming that beam shapes with different radii may actually occur, the 
largest possible resolution happens when the smallest beam is considered. So far there is no prior information to eventually distinguish one 
part of the patch from the others, therefore we assume that a uniform rectangular grid with as many cells as the total number of "beams in 
the patch", would be perfectly adequate to define the starting points of the "snakes". Thus; 



^ Patch total number of pixels inside the 'core' area ^^^^ 

Size of the smallest beam (in pixels) 

We add an extra two dimensional random offset from the central points of each grid cell in order to avoid some unwanted correlation 
effects due to the perfect spacing of the grid. Some may argue that, in addition to the likelihood real peaks (those resulting from the sources 
in the map), we should expect a large number of fake ones resulting from the interaction of the filter with the noise fluctuations (figure[Tll. 
We considered such possibility by devising a "jump" procedure (see below) to avoid the "snakes" from being "caught" on those fake peaks. 
Nevertheless, as in the case of a CMB background, when the number of these spurious peaks largely outnumbers the real ones, one should 
increase the density of "snakes" (figure|2j. For the white noise detection examples presented in section[8] we have taken all of the above into 
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Figure 2. The likelihood ratio manifold of the models for the presence of a source over its absence j-^ ™d R dimensions suppressed): radius R = 4.0 
pix; Background parameters sq = 11.57 pix, wj = 0.01, wg = 0.99 (see sectionl?) 




Figure 3. The jump procedure 



consideration. The number of snakes chosen, A'snakes = 100, meets these requirements, for both cases (ie LISNR and HMcL, see section[8]l. 
As we assumed that no sources are placed into the patch borders we consequently did not include those pixels. 



5.2.1 The jump procedure 

The Brent line minimizer keeps at any time a set of three pairs of values, argument and function value, where the middle pair is said to be 
'^bracket". This is because the argument part of the second pair lies in the interval defined by the other two and its function value is lower 
than the function values of the interval limiting pairs. We call them 

([a, f{a)] Left bound 
[b, f{b)] Current "particle" solution (19) 
[c, /(c)] Right bound 

Like the simulated annealing example previously referred, the rational behind the jump procedure was borrowed from natural world, 
but this time from the realm of quantum mechanics. We imagine our current best guess, the bracketed pair, [b, /(&)], as the particle inside a 
potential well defined by the limiting pairs (left and right bound). If we adopt the "classical" point of view there is no way the particle could 
possibly escape from the bracketing barriers. This is exactly what happens with the traditional Brent procedure. Instead if we use a quantum 
approach it is still possible for the particle to tunnel out from the barrier. 

The transmission coefficient T of a particle with energy E which faces a potential barrier with a height equal to Vo(Vb > E), and a 
width I, is given by: 
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where p2 is = y — —, m is the particle mass and h is Dirac constant, where we assume P2I <^ 1 (Claude Cohen-Tannoudji et al. 
( Il973h ). From the quantum mechanical example we have only retained the negative exponential dependence with the barrier length (the jump 
distance in our jump model) and the characteristic constant depending on the difference between the particle energy and the barrier maximum 
potential. Following this example we define a random variable A exponentially distributed 

p(A = A) = ie-^/^ (21) 
with average 0. The next step is to define 6 dependence on the parameters, bearing in mind the tunnelling effect behaviour We chose: 

0-L[i^) (22) 

where L is the "maximum average jump length", fs is the "function values scale", and A is either the "right barrier" = Ar when we jump 
forward or "left barrier" = A; when we are trying a jump along the opposite direction (figure|3j. 

The procedure has two tuning parameters: one (whose value is given as input from the parameter file) to divide the maximum allowed 
range for the current jump in order to produce the L value, and another one which purpose is to define a scale for the function values. This 
second value: fs is the maximum range of function values evaluated from the "snake 's tails" locations (initial values drawn from the grid 
as explained above). When we start the minimization process by defining the bracketing interval, the well walls (defined by the differences 
of the function values. A,- and A;) are usually large and consequently only very small tunnels will be allowed. If A > /s no jumps will be 
tempted. As the minimizer converges and approaches the minimum, the barriers will become smaller and consequently bigger tunnels will 
become more and more probable. This jump procedure has the advantage of exploring the function domain progressively as the algorithm 
approaches the bottom of the well. If by accident we have first hit the global minimum when the solution is still far from the bottom only 
small jumps are allowed and the probability of tunnelling to a local peak is small. As soon as we get to the bottom, A ~ 0, large tunnels 
become probable (6 ~ L), but as the function is now close to its lowest value, a successful jump is now improbable. But if instead we have 
started within a local peak instead of becoming "stuck", as the minimizer approaches its last steps, large jumps become common and the 
probability of "plunging" into a deeper well is now very high. One of the main reasons for choosing the exponential distribution, apart its 
close relationship with the quantum tunnelling, is the simplicity and efficiency of a random deviates generator whose output will follow it 
(provided we already have a routine which produces random deviates with a uniform probability distribution inside a certain interval). We 
proceed as follows: 

For each Brent iteration: 

• Draw a sample u from an uniform distribution between [0, 1). If u ^ 0.5 choose a positive jump, otherwise move backwards. 

• Draw from the tunnelling distribution. If the jump ends up inside the "inner zone" we reject it because we are on the same well we have 
started from. If the jump is outside the "inner zone" we evaluate the function at that point [z, f(z)]. If the new function value is lower than 
the one that initiated the jump, f{z) < /(&), we accept it and restart the Brent minimizer with this new value. 

5.2.2 The effectiveness of the jump procedure in recovering "lost snakes" 

Our extensive set of simulations covering a typical assemblage of different scenarios shows that the use of the jump procedure has a fairly 
good effectiveness. We have achieved equal levels of quality on the obtained results using ~ 20-25 % fewer snakes. However, within our 
current setup its importance is far from critical since speed is not a concern to us. The extra level of complexity which results from the 
introduction of the procedure, could have been avoided by launching extra "snakes". However, when dealing with such a scenario where the 
cost of starting additional "snakes" is much higher or the problem cannot just be circumvented by setting up more starting points for the 
minimization procedure, we believe that the jump procedure might have an important role to play. 

5.3 4-dimensional Powell minimisation step 

Instead of just repeating the filtering procedure using different values for the R parameter, we proceed by following a rationale which tries 
to mimic that behind the FFT immense speed up. FFT is fast because it reduces, using factorization, the number of operations needed to 
compute the "complete set" of Fourier coefficients. If we only need one or two (if the number of coefficients we want to compute is less than 
ln(n); n being the size of the vector) then it would be more advantageous to use the traditional formula directly. Hence, we now search the 
full parameter space {X, Y, A and K) with the help of the Powell minimizer, but this time using the previously estimated parameters (from 
the 2-dimensional step) as starting values. We are not expecting the "Powell snake" to unfold significantly outside a small neighbourhood of 
the initial value. Thus, as we do not need to evaluate the likelihood outside this small positional range anymore (as in the FFT case where we 
just want to compute one or two coefficients) using the real space proves to be more advantageous. 

In the next step, we perform multiple Powell minimisations in the full 4-dimensional space, starting each minimisation from the 
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{X, Y, A, R) - positions found in the previous step. In tliis step the 'jump' procedure described above is not used. This is one of the main 
reasons why we have first performed the 2-dimensional step (and the pre-filtering). From our experience we could verify that after performing 
the 2-dimensional step, the neighbourhood of the likelihood manifold around the vector solution {X, Y, A, R} we had so far obtained was 
rather smooth and "well behaved". Thus, the necessity for a procedure in order to avoid local maxima is no longer required. The resulting 
end-point of each minimisation is a local maximum of the posterior, giving the optimal parameter values for some putative detected object. 
A Gaussian approximation to the posterior is constructed about the peak and the detection is either accepted or rejected based on an evidence 
criterion (see Section[6]l. If the detection is accepted, then an object with the optimal parameter values is subtracted from the map before the 
next downhill minimisation is launched. 



5.4 Error estimation - Fisher analysis 

For each accepted object, estimates of the uncertainties on the derived parameters are obtained from the Gaussian approximation to the 
posterior around that peak. In detail the covariance matrix of the parameter uncertainties has elements dj = (SaiSaj), which can be 
obtained by inverting (minus) the Hessian matrix at the peak. In the case of uniform priors, one has simply 



(23) 



Here we present an estimation of the putative source parameters' error bars using the Fisher analysis over the model Hi. We present 
two different cases: 

White noise The analysis is carried out in real space 

"Coloured" noise Fourier space is used instead (see subsection |73} 

We start by defining Fij as the negative expectation of the log-likelihood Hessian matrix, in other words the "Fisher information matrix". 
We are assuming, as we have already stated, a Gaussian approximation of the likelihood around its maximum in parameter space. 
Let Lhi = Pr(D|0 J/i) be the likelihood of model Hi, then 



Assuming a Gaussian model for the diffuse background we have: 

- In (Lh,) = c + ^ iV-/d,r, - i ^ N;^t,t, - i ^ Nr/d,d„ (25) 

i,j i,3 i,3 

where N~^^ is the {ij) element of the inverse correlation matrix of the diffuse background, p, is the i map pixel, is the i pixel of 
the object template t{v, A,@,ro) and c is an unimportant constant. The parameters of the object template are: A the amplitude of the 
object, ro the positional parameters and which represents the ensemble of the remaining parameters. In this analysis we are assuming a 
circular symmetric source. Thus, contains only a single parameter, R which provides a scale for the radial dimension of the object, ie, 
R = "Radius of the object". We will assume, from now on, the following functional dependence of the source template on its parameters: 

T{r,A,&,ro)=AtO-^-jp^'] (26) 



5.4.1 White noise 

For a background diffuse component generated from a Gaussian stationary process with coherence length much smaller than the pixel size, 
ie Gaussian white noise, the following condition applies: 

^r/ = -^Sij (27) 

This is the situation when dealing with a system where the instrumental pixel noise is the dominant source of the background component. 
Adding the condition for statistical homogeneity : 

' ' (28) 
and substituting both conditions into the log-likelihood expression this becomes 
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(29) 



Evaluating expression l l24t for all the possible parameters' combinations and using the following sort criterion {A, R, X, Y}, one obtains 



where {q, /3, 5, 7} are pure numerical constants 
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(30) 



(31) 



^ and © = |i,7?,X,y| are the values of the template parameters which maximize the likelihood. We have used pixel 



metrics together with the assumption that the sums can be evaluated using the continuous limit (this introduces only minor errors). When 
considering a source template with exactly the same functional dependence of equation l|6), the numerical constants become: 



Now inverting the F matrix 
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(32) 



(33) 



where V = af3 — S and using the Cramer-Rao inequality, which in this case reduces to an equality: 



(34) 

since we are dealing with a max-likelihood estimator which is efficient. Writing explicitly the above expression ( 1341 ) for each parameter, one 
finally obtains the desired parameters' error bars: 



AA ■ 



' J2- AT? = ^ 



AX — — ^— 



AY 



A 



Defining PSNR = — , "Peak Signal to Noise Ratio" , and substituting into i35i : 



(35) 



AA 



4X>' PSNR 



AX = 



AY = 



PSNR yi^' PSNR ^/Ty ' 

we get another way of expressing the error bars, where we have emphasized their dependence on the "quality" of the signal. 



(36) 



6 BAYESIAN OBJECT VALIDATION 

It is clear that a key component of our approach is the step for accepting/rejecting as a real object each posterior peak. Indeed, a reliable 
means for performing this step is crucial for the method to function properly. The decision whether to accept or reject a detection is based 
on the evaluation of the Bayesian evidence for two competing models for the data given by equation (|5](. PowellSnakes evaluates the models 
evidence using a Gaussian approximation to the logarithm of the likelihood ratio of the competing models 

m (^4^^) , (37) 



Pr{D\Ho) 

It does so by expanding Equation ( |37b around its maxima in the parameter space and rejecting all the terms equal and above the third 
order. As the null model (Hq = "There is no source centred in region 5") doesn't have any parameter, thus behaving like a constant when 
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considering the parameters space, maximizing the logarithm of likelihood ratio holds the same results as the max-likelihood estimator of 
the source parameters of the Hi model only. Therefore finding the peaks of the logarithm of likelihood ratio, corresponds to computing the 
max-likelihood estimator of the putative source parameters. 

The precise definition of the models Hq and Hi are given below. In general, however, we shall adopt a formulation of the model 
selection problem in which each model Hi (i — 0, 1) is parameterised by the same parameters a — {X, Y, A, R), but with different priors. 
In this case the evidence for each model is: 

Zi = J L{a)ni{a)da, (38) 

where the likelihood function L{a) is the same for both models and is given by dlOb . Although not required by the method, we will assume 
that for each model the prior is separable, as in equation Jilt , so that for i = 0, 1: 

n,{a)=TT,{X)n^{A)n,{R). (39) 

where X is the vector position of the source. 

Moreover, we shall assume uniform priors within specified ranges on each parameter, although the method is easily generalised to 
non-uniform priors. The general form of our models for the data are: 

Ho ~ 'there is no source centred in the region S", 
Hi — 'there is one source centred in the region S". 

If the region S corresponds to the coordinate ranges [Xmin, ^max] and [ymin, ^max], then 

fV|5'| if^G 5" 
I otherwise 

where j5j is the area of the region S. We assume the prior on R is also the same for both models, so tto{R) = ni{R) = tt{R), but the priors 
on A are substantially different. Guided by the forms for Ho and Hi given above, we take 7ro(j4) — S{A) (which forces A — 0) and 

^^(^) ^ Jl/AA if ^ G [A^in, A^ax] 

1 otherwise ' 

One could, of course, consider alternative definitions of these hypotheses, such as setting Hty. A ^ Aum and Hi: A > ^iim, where Aum is 
some (non-zero) cut-off value below which one is not interested in the identified object. We shall not, however, pursue this further here. 
Given the above forms of the priors, the evidence for model Ho is simply: 



dX dA dR TT® 5{A) n{R) L{X, A, R) (42) 

= Lq J dXdRTT{X)n{R) ^ Lo, (43) 
since Lo = L{X, A — 0,R) is a constant and the priors are normalised. For model Hi, the evidence reads 

Zi{S) = / dXn{X) J dAdRTi{A)-n{R)L{X,A,R) (44) 

dZ7r®Pi®, (45) 



L 



= ^//i®'^ (46) 

where we wrote explicitly the dependence of the evidence on the chosen spatial region 5"; we have also defined the (unnormalised) two- 
dimensional marginal posterior Pi ® and |S| is the area of the region S. 

So far we have not addressed the prior ratio Pr{Hi)/ Pr{Ho) in (|5}. Although this factor is often set to unity in model selection 
problems, one must be more careful in the current setting. For the sake of illustration and simplicity, let us assume that the objects we seek 
are randomly distributed in spatial position, i.e. they are not clustered. In that case, if fis is the (in general non-integer) expected number of 
objects per region of size l^j, then the probability of there being A'^ objects in such a region follows a Poissonian distribution: 

Pr(iV|/i,) = (47) 



Thus, bearing in mind the above definitions of Hq and Hi, we have: 

Pr(Hi) 



Pr{Ho] 

Hence, the key equation lO for model selection becomes: 



= MS- (48) 



Pr{Hi\d) Zi{S) I,Pi{X)dX 
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where /i = is the expected number sources (centres) per unit area. 

There is a certain degree of freedom when choosing the level p must reach in order t o cons ider that model H\ is present (a detection) 
or not. This ambiguity may only be overcome by advocating decision theory (see Jaynes From now on, we will always assume a 

criterion of 

symmetric loss = "an undetected source is as bad as a spurious one." (50) 

When introducing the criterion of symmetric loss the condition for detection immediately becomes uniquely defined. One accepts the detec- 
tion if 



P > 1, (51) 

and rejects it otherwise. One can change our decision criteria, for instance suppose we want to put emphasis on reliability (which means we 
are willing to accept an increase in the number of undetected sources by a significant decrease in the fake ones), than one must change the 
threshold for acceptance/rejection. This means that one must change the value that p must reach. 

The only remaining issue is the choice of the region S. Our method for locating peaks in the posterior is direct (local) maximisation, 
which yields the parameter values a = {X,A,R) at the peak. 

In this peak-based approach, S is taken to be a region enclosing the entire posterior peak in the {X) - subspace (to a good approximation). 
This, in fact, requires a little care. If S were taken to be the full mapped region, then the evidence Zi{S) would have contributions from 
many local peaks in the posterior. For each putative detection, however, we are only interested in the contribution to Zi {S) resulting from 
the posterior peak of interest. In practice, our Gaussian approximation to the posterior around this peak means that other peaks will not 
contribute to our estimate of Zi{S), thus making a virtue out of a necessity. Moreover, as we show below, provided the region S does enclose 
the posterior peak of interest in the (X) - subspace, the resulting model selection ratio p will be independent of the region size \S\. 

In the present case, using J46t and assuming uniform priors on A and R, one has: 

^'^'^ = W\ Is ^ WKaar I '''' 

Making a 4-dimensional Gaussian approximation to the posterior about its peak a, one obtains 

where the elements of (minus) the Hessian matrix C^^ are given by l |23b evaluated at the peak a. The above expression again ignores possible 
edge effects due to abrupt truncation of the posterior by the priors. In Section lMl we give an account of how we solved this problem. Using 
the expressions l llOt , l ll6t and l ll7t for the likelihood, one finds (in logarithms) 

In p^ In PS- ln\S\ + 21n(27r) - ln{AAAR) + ^a(R)[A{X, R)f + i In |C(a)|. (54) 



Most importantly, since 



AtsocIS*!, (55) 



we see that \np is independent of the size \S\ of the region considered in the (X_) - subspace, provided S encloses entirely (to a good 
approximation) the peak of interest. 



6.1 Acceptance/rejection threshold 

Consider the variable i' 



(56) 



where (nirij) — a'^Mij — Nij and is the generalised signal-to-noise ratio for a unit amplitude object (according to Savage & Oliver 
( l2006h notation). This is the same as 

.,_ d-N-H{a) ^^^^ 



which is the more common form one may find of v = "the normalized field amplitude" (Lopez-Caniego et al. Ilool)).' 'The normalized field 
amplitude" is the amplitude of a random stationary Gaussian field whose power spectrum [Bij])} satisfies the condition B{r])dri — 1, 
ie, (T = 1. Using Parseval theorem, it is straightforward to show that the power spectrum of some background that was filtered by an 
un-normalized matched filter, A4{'q), satisfies: 



/ 



Mir]) drj ~ t{afN'h{a). (58) 



Assuming that p is neither a function of the position nor a function of the considered area, this implies that p must be a constant. These 
conditions have been obtained using the principle of indifference along with the restriction i55\ . Integrating, one obtains: 
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Figure 4. Right hand side of inequality J60t as function of ISNR = Ay^, 
The arrow shows the direction of increasing 'P 



where N is the expected total number of sources in the patch and As is the total area of the patch. 

Let us now recover the condition for symmetric loss, p > 1, together with the key result for model selection l l49t . Substituting l l59t into 
it and using expression l llOt we obtain: 

.>^ + -^ (60) 



where V (the prior term) is: 



^ = ln(^)+ln((2^)-^lC(a)rt). (61) 



where Vt is the total parameter volume, ie, Vt ~ {A,nax — Amim) {Rmax — Rmin) As and A*" is the expected number of sources in the 
map. 

In Figure |4] we display the right hand side of inequality ( 1601 ) as function of ISNR = A^/a. We plot several curves for typical values 
v. Something which is immediately evident is that each curve has a lower bound which depends on V which in turn depends on the priors. 
Regardless of how small the peak is there is always a lower bound for it to be considered a true detection and not merely a noise fluctuation. 
This is to be expected from an assessment condition which enforces a policy of robustness against false detections (spurious). The threshold 
curve rises on both directions: when the signal (peak) is too weak because there is a strong possibility of being a noise fluctuation; and when 
the signal is high because there is a lot of evidence supporting the presence of an object. Hence, rising up the threshold assures an extra level 
of security against spurious detections without compromising the degree of detection. 

Expanding V using the results from subsection 15 .4| and substituting them into ( 1611 ) we obtain: 

^-lnr^^4^V (62) 



327^2^/0 7jVi3 7^ 

The curves in figure|5]show the dependency of the threshold for detection on the estimated amplitude A = "estimated source amplitude", 
of the putative source under study. We used data that close mimics the example from HM03: Vt = 81000 (5 x 0.5 x 32400), A'^ = 8, a = 1, 
R £ [5, 10] and A £ [0.25, 0.75]. We plot three curves for each prior with R taking the values {5(blue), 7(green), lO(red)}. 



6.2 Detection and characterisation robustness 

The Bayesian framework provides the necessary tools for both of the primary actions of PowellSnakes: detection and characterization. 
However, the impact of a mismatched prior parameter value has considerably different consequences depending on what we are doing. 
According to Jaynes j2003l) . the Bayesian inference rules prescribe that when we have a sufficiently informative likelihood the posterior is 
only slightly affected by the choice of the priors. Hence, one may expect that a prior mismatch will only have a minimal impact when we 
are dealing with bright sources. Our simulations are consistent with these predictions. But when we are tackling faint sources instead, the 
situation changes. Now the likelihood is no longer very informative and the priors play a part. Although, when doing parameter estimation 
and since we are using flat priors only, they do not affect the parameter estimates. On the other hand, something quite different happens when 
doing the detection step. When performing model assessment the priors must be fully normalized. Choosing a wrong prior parameter range 
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Figure 5. Right hand side of inequality (60) as function of "estimated source amplitude" = A. Three values of R were used (pixel units): 5 (blue), 7 (green), 
10 (red). 



can have an important impact on the quality of the detection, either by significantly increasing the number of false hits (spurious) or , on the 
contrary, making the algorithm become "blind" to detecting sources (misses). Q 

6.3 Quality of the detection: An upper bound 

So far we have only discussed the condition on the detection related to the tuning of the threshold for acceptance/rejection to minimize 
the symmetric loss. We have said nothing about the level of success achieved. In other words, we know our procedure is tuned in o rder to 
minimize the number of undetected plus spurious sources, but we have not yet discussed how well we can do. Following Van Trees l l200lh 
the best a "detector" can achieve is when we have absolute prior certitude about the characteristics of the detection process which is the 
same as knowing for sure the true values for all parameters. This condition is also known as "simple hypothesis test" and translates into the 
following prior: 



7ro(a) = 5(ai - aio)5(a2 - a2o) • . . 5(ai - am), 



(63) 



where 5{x) is the Dirac delta function. Recovering expression l |49l l, making /is = 1 Pr(_ffi) = Pr(_H'o), which stands for, no previous 
information favours one model over the other = ''the data must say it all", using the condition for symmetric loss l ISlK we have that: 



Lo 



> 1. 



Taking logarithms on both sides of the inequality and using expression dlOt we obtain: 



(64) 



(65) 



where ao is the true parameter vector. The left hand side of the above inequality is an estimator of ■ It is very easy to show that the 

likelihood ratio estimator conditioned on ''there is no source", "'"^^'^"^ 
where ISNRo = Aoy/ao. Hence 



= r^Ho is Gaussian distributed with Ni-^ISNR^, ISNR^), 



ISNRo 



N{0,1). 



Substituting fiHg into inequality i65\ . we obtain 



1^1 Hn > ^ISNRo. 



(66) 



(67) 



Each time this inequality is satisfied we presume we are in the presence of a source, a true detection. But in our assumptions we have assumed 
the opposite. Hence, this is the condition for a spurious detection. Therefore, the probability for the occurrence of a spurious detection is: 



Pr( H, \Ho) = 



(68) 



where Hi means "once we have chosen Hi" and erf{x) is the "Gauss error function" . Since the loss is symmetric, the probability for the 
other type of error Pr( Hq \Hi), the probability that an undetected source occurs, should match that of the spurious one. Thus, 



In PowellSnakes (Version II) the flat prior on the source amplitude A has been replaced by a much more realistic power law: N{s) oc s ^, and the extreme 
sensitivity of the detection step upon the values of the prior parameters was gone 
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Loss \ 



Figure 6. Curve for the theoretical lower bound for Pv{Err) =Loss as a function of ISNRq 



I - erf I ^ ISNR„ 



Pr( Hu = Pr( Hi \Ho) = ^— (69) 

We are now in position to establish the lower bound for a detection. When performing the detection two independent and exclusive types of 
errors can occur: 

Spurious => PrilhHo)^ Pr(^ | J/o) Pr(ffo) ^^^^ 
Undetected => Pr(^ Hi) = Pr(^ | J/i) Pr(iyi). 
The total probability for an error to occur is 

Pr(£rrjao) = Pr{Ih\Ho)Pr(Ho) + Pr{H^\Hi)PT{Hi), (71) 
where the conditioning in ao explicity expresses the assumption of perfect prior knowledge on the parameters values ("simple hypothesis"). 
Taking advantage of the symmetry Pr( Ho \Hi) = Pr( Hi \Ho) and the normalization condition Pr(_ffo) + Pr(-ffi) = 1, we finally get 

l-erf( ^UfUki) 

Losso = Pr{Err\sLo) = ^ ^. (72) 

Figure|6]shows the dependence of the loss theoretical lower bound on ISNRq. One can see that ISNRo plays a pivotal role in defining 
an upper level of the quality of the detection process. 

In many situations the radius of the source intensity profile, as recorded in the pixels map, may be considered known and constant 
throughout the patch. One of these cases is that of considering point shapeless sources. In this case one can further assume, with a high degree 
of accuracy, that the pixelized intensity closely follows the PSF of the antenna. Thus, considering a statistically homogeneous background, the 
ISNRo becomes a function of the source intensity only, Aq, as ao = t{RofN-H{Ro) may now be considered constant . This separation 
allows us to make predictions about the minimum flux that one can reliably detect taking into consideration our goals and the experimental 
setup. Let us define "Normalized Integrated Signal to Noise Ratio" = NISNR which is the ISNR of an unit amplitude source template. 



NISNR =Va= ^Jt{R)TN-H{R) (73) 

This quantity is of considerable importance when establishing a boundary on the source fluxes that can be reliably detected, as it provides a 
scale of measure. One may think of A (the source amplitude) as being the ISNR measured in NISNR. For a background correlation matrix 
considered diagonal (white noise, eg 127b and the Gaussian source template ([6]l, an analytical evaluation of the N ISNRq is possible: 

N ISNRo = 2/^. (74) 

In all interesting scenarios one rarely has an allowed parameter domain which sums up to single point in parameter space ("simple 
hypothesis"). Now the question is how to define a single number which somehow describes a representative figure for the detection quality 
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Figure 7. Plot of equation 1771 . Eacli curve represents a different value of Im shown close to it 



upper bound of the ensemble of ''simple hypothesis" that constitute the volume of the parameter domain. Within a Bayesian framework we 
tackle this issue by treating the simple hypothesis parameters as nuissance parameters and integrating them out. Hence, our proposed solution 
is to average the ''simple hypothesis" loss limit, Lossq, over the properly normalized probability priors for the parameters: 



(Loss) = PiiErr) = J Pr ( Brr] a) tt (a) d^a, 



(75) 



Substituting ( 172b into the equation ( I75t above, and dropping the zero subscript once it no longer carries any special meaning, we obtain: 



(Loss) 



1 — er 



r ( V2 ISNR{b) \ 

-A — t — L ^(b) d^'b, 



(76) 



where b stands for the entire parameter set except the position coordinates. We are implicitly assuming a factorisable parameter prior (see 
formula [TTt. The above formula plays a key role when defining the expected quality for a given detection and measurement setup. If the 
predicted detection scenario has a low average ISNR, we should never expect a good detection performance. By experience we have 
verified that when tackling real situations the expected loss is usually much larger than the predicted theoretical bound. 

When considering the case where the prior on the source amplitude, n{A), can be assumed flat on the region of detection, and the source 
radius constant across the patch, to a good approximation an analytical evaluation of the above expression is possible: 



(Loss) 



2{h 



u ( 1 — erf 



V2u 



(77) 



where Im = "minimum ISNR" and Im = "maximum ISNR". From figure |7] one realise the importance of previewing the ISNR limits of 
our measurement setup in order to define the expected upper bound on the detection quality. 



6.4 Discussion on truncation of tlie posterior by the prior 

So far the procedure described (formula |54} does not take into account the fact that the prior might abruptly truncate the posterior before it 
falls (largely) to zero, and hence may lead to an overestimation of the integral. The purpose of this discussion is essentially directed towards 
the truncation of the posterior which happens in the {A, R} sub-space. When considering the position sub-space, the prior is independent of 
the area considered |s|, as far as this fully encompasses the likelihood peak, which is always the case when dealing with real scenarios. 

When we were dealing with the source shape and the beam melted into a single template of radius R embedded into a background of 
white noise only, the effect of truncation was already evident, especially when were tackling cases of low and very low ISNR (= A^/a), but 
not extremely severe. This has provided the rational behind the "timing parameter" see below. The effect was not severe because the extent 
of the area under the likelihood fitted fairly well inside the prior box because the A and R parameters error bars were comparatively small. 



6.4.1 The tuning parameter (j) 

When assuming flat priors for all the parameters, the priors become constants and therefore can be taken out of the evidence integrals. Thus, 
one may now use the approximate formula i54t as a fast way of evaluating the evidence integrals, as far as the truncation effects are kept 
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Table 1. Tuning parameter (p as function of (ISNRq) 



(ISNRq) is small ( 5 ) 


<l> e (0.66, 1] 


5 < (ISNRo) < 7 




(ISNRo) is lai-ge ( ^ 7 ) 


(1,1.33] 



under control. But especially when dealing with faint sources we should expect these effects to become significant and hence they must be 
taken into consideration. Therefore, we added a tuning parameter, 0, to correct for possible systematic deviations from the true value of the 
integral, ie: 

./.(2^)^^/^|C(a)|^/^ _ /,L(a)d"a 

Vt ^ 

The new expression for the assessment level curve (right hand side of inequality I60t which results from the introduction of the "tuning 
parameter", is just the old form where we have only changed the V expression: 

r = ln( , , V (79) 

6.4.2 How to evaluate the parameter (f) 

In the simple case of white noise it is possible to compute an average value of the tuning parameter: 

:^^\ =0(2.)-/^lC(^)l^/V^\ , (80) 
where, 14 = ''Parameter volume of a single peak". Substituting expression l IlOl l into l |80| l we get 

'^^ (27r)21C(a)|V2exp(ia) ' 



(81) 



where "Average value of the data under hypothesis H\ = {A)^jj" = "signal vector true value = sq". The integral in the numerator is 
difficult to compute, but restricting our analysis to assuming a white noise background only M^^ — I, obviates a numerical evaluation of 
this integral. 



(a) 1 s(a)"r5(a) 
2 2 ^ 



d^'a 

. (82) 



(27r)2|C(a)|i/2exp(ia) 

From expression l l82t . one sees that the value of (f> depends on sq, the true shape of the object. In Table [T] we display several values of (j) 
which result from the application of the above formula to typical scenarios (ISNRq = Aoy/oo). The values we obtained for the "tuning 
parameter" are consistent with those we first expected. When dealing with scenarios with faint signals =J> low (ISNRq) we anticipate the 
parameter values being close to the priors inferior bounds and at the same time large error bars are expected, hence the likelihood truncation 
effects should make a significant contribution to the value of the evidence integral. Thus, the low value of <j> should not be a surprise. When 
the opposite happens high (ISNRq) the Gaussian approximation to the likelihood peak is no longer accurate and the large <j) is just a 
consequence of the wider tails of true likelihood function. 



6.4.3 Impact of (j) on the performance of the algorithm 

Using a series of simulations (see section [S} for different detection scenarios, we estimated the impact of the tuning parameter, (f), on the 
performance of the algorithm. We concluded that: 

• The algorithm is not very sensitive to changes of 0. 

• The differences in performance never exceeded 1.5% when compared with the fiducial value 0=1. 

• The largest difference occurred when we dealt with very small values of (ISNRq) (< 4), or very large (> 8) . 

• And most important of all: "The exact values of <j> always produced the best results". 

Therefore the golden rule displayed in Table |2] should be applied. 
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Table 2. Tuning parameter cj>; The golden rule 



(ISNRo) ^5 use = 0.75 
5 < (ISNRo) < 7 use = 1.00 
(ISNRo) ^7 use (/) = 1.15 



7 INTRODUCING "COLOUR" 

When dealing with real astronomical detection problems, the background diffuse component of the maps is not usually dominated by the 
instrumental noise. Nowadays, the most common scenario is the one where the largest signal component of the background has an astro- 
nomical origin: The ubiquitous CMB and the galactic foregrounds (free-free, dust, synchrotron). Currently we are focusing on the CMB 
especially because it can be considered statistically homogeneous and isotropic across the entire sphere with great accuracy (assumptions of 
our simplified model). The CMB radiation field has already an intrinsic coherence length (~ 10'), which taking into consideration the current 
surveys resolution (usually lower than 1' per pixel), may be hardly considered white. In addition, as it is collected by an antenna with a finite 
aperture (usually several pixels), whose net effect, in Fourier space, is that of a low pass filter, an extra degree of correlation is injected into 
the background. Hence, only an algorithm which appropriately deals with a background whose power spectrum is not flat could possibly cope 
with such a setup. The power spectrum of the simplified background model we proposed (seel86t is not yet fully realistic. However, despite 
of its simplicity, this model, already introduces the main characteristics of this background. This allow us a direct comparison between the 
model predictions and the simulations results. When dealing with a background assumed to be generated by a stationary Gaussian process 
but now with an arbitrary coherence length, the condition Ml\ on the inverse correlation matrix no longer holds. Nevertheless, despite the fact 
the correlation matrix is no longer diagonal, if the condition of statistical homogeneity still holds, one may always assume the background 
correlation matrix to be circulant. Taking advantage of the circulant property of the correlation matrix and transforming to Fourier space, the 
problem significantly simplifies as the power spectrum of a circulant correlation matrix is always diagonal. This is not a severe limitation 
because considering patches of small size only, the condition for statistical homogeneity is a good approximation. We are assuming not 
only statistical homogeneity but isotropy as well. Most of the formulas presented here will hold even if we drop the condition for statistical 
isotropy. We shall ignore this fact and we will use the statistical isotropy condition throughout the remaining text. We do so because it gives 
us the opportunity for significant simplifications without restricting the opportunity for real applications, as this condition holds pretty well 
when tackling the great majority of the problems of interest. Expressing equation l |25t (multiplied by -1) in Fourier space one obtains 

ln(LHj-c +J —^^drj~-lJ—^^dn + J g(^) drj , (83) 

where the usual k = 2717]. In a similar way for the likelihood Lhq of the background 

where each of the symbols f , d are the Fourier transforms of r and d respectively, and B{ri) is the normalized Power Spectrum of the 
background, ie, B{ri)dr] = 1 <t4> cr = 1 In what follows the symbols with a tilde on top mean the Fourier transform of the original symbol 
(without the tilde). We are assuming that a pre-normalization of the map's pixels has already been done by dividing the value of each map's 
pixels by the map rms value (= a), which implies a value of cr = 1 for the resulting map. 

Let us now introduce, for the first time, correlation effects in the background diffuse component. This correlation is to be expected not 
only because it already exists in the original astronomical background (CMB), and in the diffuse foregrounds (galactic components), but also 
due to the effect of the antenna PSF. We are considering now a simplified model for the autocorrelation function of the background, after 
being passed through the antenna. This model is of the form: 



(xjXi)j = nfcdxj - Xij) = e , (85) 

where so is a measure of the "coherence" length of the Gaussian isotropic homogeneous stationary background field (see Goodman 

The autocorrelation function is already properly normalized as required by our p reviou s assumptions, ie, erf — (xi'x.i)^ = nt{0) — 1 

Transforming to Fourier space and using the Wiener- Khinchin theorem (Goodman bOOOb ) we get: 

Bbiv) = TT^ e '^""^ , (86) 
where B{r]) is the "Power spectrum" and rjo = (27rso)~^. 

^ There is not a complete consensus about the exact definition of "coherence length". We shall follow Goodman's definition, "coherence length" 
= \J coherence area, which applied to our particular form of the autocorrelation function gives v^^O 
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Figure 8. NISNR (c units) vs "source radius" R (pix) ; sq = 11-57 pix, wi = 1/5, ujs = 4/5 



Together with the astronomical components one should always expect the presence of "pixel noise". In our signal model, "pixel noise" 
stands for all the noise components resulting from the instrumental setup. We are assuming the "pixel noise" to be independent from pixel to 
pixel, hence configuring a white noise process. We also assume that the"pixe/ noise" is uncorrelated with all the astronomical components, 
thus allowing us to write the total background power spectrum as the sum of these two components: 

■BW = Wfl— e^K^) J^wi (87) 

where Bi{r]) — 1 (white noise process) and we added the constants {wi, wb} in order to allow the signal components to be present in 
different amounts, hence encompassing a broader range of possible scenarios. These constants must satisfy the normalization condition 
wi + Wb = 1 as the total power spectrum must integrate to unity. 

7.1 ISNR - the symmetric loss lower bound 

Let us begin by expressing in Fourier space, the equivalent representation of the ISNR, using the appropriate form of the Parseval theorem: 

ISNRie^r = r(a)N- V(a) = J dr,. (88) 

V 

First we will handle the simplest possible case where the radii of the sources are constant. We start by evaluating the NISNR: 



NISNR{a.) = 

as this will provide us with a metric to predict how deep we can go in the flux scale without compromising the detection quality. Substituting 
the expression ( I87t for the power spectrum and the unity amplitude Gaussian template l|6j into ( 1891 ) and evaluating the integral, (which has 
numerical solution only) and using typical values for the sources and background parameters we obtain the function plotted in figure[8] 

When we were dealing with a white noise only background the NISNR was proportional to the radii of the sources (see formula[74l(. 
However when we introduce the correlation, a completely different curve results. Now, the dependence of the NISNR on the object radius 
stops being linear and a complex shape emerges. 



/ 



i{T],a)i{-ri,a) 



dr] 



(89) 



7.2 The '■'whitening" equalizer 

A quantity which plays a central role in the detection process is the likelihood ratio of the competing models (see formulasl83land [84l l. After 
taking logarithms one obtains: 



Now let us assume that our source template may be modelled by the following expression which is a generalization of equation l|6j: 



T{-q) = h{b,3,ri)e' 



-i27r77 .r Q 



(91) 



where bp = all variables but position , i 
origin. Substituting into l |90| l we get 



-1, ro = (xo, j/o) and rj) is the Fourier transform of the source template centred in the 
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Figure 9. "whitened source" R <^ sq : radius R = 3.4 , NISNR ^ 23.31 ; Background parameters so = 11.57 , wj = 0.01, wb = 0.99; all distances in 
pixels 



where JT"^ is the inverse bi-dimensional Fourier transform. One should already be familiar with this expression: the second term is 1/2 of 
the ISNR expressed in Fourier space, which doesn't depend on the position coordinates, and the first, in the argument of the inverse Fourier 
transform, is the "Linear matched filter" applied to the map, where: 



"Linear matched filter" = — ' . (93) 

Further we can read this expression in a somewhat different manner. Let us write the power spectrum as the product of its square root (the 
expected amplitude spectrum), ie B{r]) — y^B{r])^B(r]). Substituting into the argument of the Fourier transform (first term of|92|l we have 



Evaluating the square of the first term of J94t ensemble average, ie its power spectrum we obtain: 



d(?7) h{bp,r]) ^^^^ 



(If)-' 

Hence, after applying the linear filter B{ri)^^^^ we removed the "colour" from the background transforming it into a "white" Gaussian 
random field. This is the reason why one calls the linear matched filter a "whitening equalizer". Let us name the resulting map after being 
"whitened": 

"whitened map" = Co{ri) = ^^^"^ (96) 
Of course, the sources buried in the map become "whitened^ as well. In fact the second term of l |94| l is nothing else but a whitened source: 



"whitened source" = ^{bfs,ri) = M^£2^1, (97) 



Rewriting formula \92\ using these new definitions 



In (^^) = :F-' {Cj{r,)i^{b0,ri)]^^ - i | ^{bp, r^f dr^. (98) 

It can be easily verified that this formula, where we used the "whitened" counterparts of our entities, reduces to the familiar white-noise 
likelihood ratio (formulas |64| and[65ll expressed for convenience in Fourier space. Hence, everything we said for the white noise case applies 
here, even when we ought to consider the effects of correlation in the background, as long as we use the "whitened' entities instead. 
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Figure 10. "whitened source" R ^ sq : radius R = 3.4 , NISNR ^ 1.47 ; Background parameters sq = 2.0 , wj = 1/5, wg = 4/5; all distances in 
pixels 



7.3 Fisher Analysis in Fourier space 

Proceeding with the same line of reasoning as that initiated in subsection l5.4l but this time expressing Lhi using Fourier space l l83t . we start 
by splitting the evaluation of the Fisher matrix into three different cases: 

• The first case refers to those coefficients whose differentiation variables 6; are not positional. We shall collectively name them as = C . 

""'^"^^i/""^^^ = - / Bi,)-^ . drj. (99) 

obidbj J dbi dhj 

These elements of the F matrix will be generally 7^ . 

• The second case deals with those coefficients where one of the differentiation variables belongs to set bi and the other to Xi , the position 
set. 

!— — ^ = i2tt / T]:,^ B{ri) '- . hibfi, 77) dr). (100) 

dbidxk J obi 

One may rewrite the expression dlOOI l taking advantage of the symmetries of the expressions involved: 

' dh*{bf3,r)) 



= -47r J Vt.u Bin) Im 



■ h{bf3,Tj) 



dri, (101) 



where 77 > means that the integral extends to all the Fourier modes which have rjx^ > (half of the Fourier plane). A necessary condition 
for those class of coefficients being equal to is h{b/3,ri) being real. On the other hand, h{bjs, r) having reflection symmetry on both axes, 
is a sufficient condition for that to happen, which is the case for our current model (formula[26t. 
• The third case is when we are dealing with the position parameters only. 

n 

As the example given above, if h{bp,r) has reflection symmetry on both axes, this is a sufficient condition for (where we are ignoring the 
trivial case /i(6/3, r) — ) 

d^{-\n{LH,)} _ / = fc/i ^jQ3^ 



dxkdxj I/O fc = j. 

to happen. This third set of the Fisher coefficients will be named as = P. 

Ordering the parameters by collecting the position parameters Xk in the upper left corner, the Fisher matrix will then exhibit a structure of 
the form displayed (Kl = in general a non zero value). 
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(104) 



C 



One may take advantage of the depicted symmetry of the Fisher matrix when trying to compute its inverse. 



F,-i 
F-} 



[c-^] 

As an illustrative example and again, with the help of formula l l34b , one may explicitly write the error bars on the position parameters 



(105) 



AX = Ay : 



1 



27r3/2 ^j;°r,3B{r,)-^h2{bp,r])dv ' 
where we have employed the circular symmetry of proposed template l |26| l. 



(106) 



7.4 Extending to real scenarios 

So far we have only considered an extremely simplified model. Our purpose was two fold: 

• Make the problem practical in terms of required resources. 

• Avoid obscuring and cluttering the fundamentals with complex "decoration" details. 

However, we should always keep in mind that the experimental setup we wish to address in the future, the Planck Surveyor satellite, and the 
object of its study, the full sky, adds other significant challenges as well. Three of them are worth of further discussion: 

The in-homogeneity of the statistical properties of the background: When introducing the galactic emission which is mainly limited to a 
narrow band around the galactic plane, a severe breakdown on the condition of statistical homogeneity occurs. Moreover the instrumental 
pixel noise is never completely homogeneous. As result of the scan strategy there are zones of the sphere which will be sampled a greater 
number of times than others, increasing the integration time and leading to lower noise levels than those of the areas where beam spends less 
time. Using small patches (~ 128/256 x 128/256 pixels) and re-computing the background for each one of them will decrease this problem 
to a level which can be safely ignored. Though, it must be noted this is not a limitation of the Bayesian framework which, on theoretical 
grounds, is prepared for dealing with in-homogeneous backgrounds equally well. We only take advantage of the circulant properties of the 
covariance matrix, transforming to Fourier space as an implementation technique which allows us a much simpler and fast solution to our 
problem rending it practical and efficient. However, the same set of assumptions is required in the derivation of the optimal linear filters 
together with an implicit assumption of Gaussianity (HM03). 

The non-Gaussianity of the galactic diffuse foregrounds: The galactic diffuse fields, for instances dust, which can show considerable 
values of non-Gaussianity, as result of the Central Limit Theorem, only introduce extremely small distortions on the Gaussian distribution of 
the Fourier modes. The non-Gaussian part of the field brings in no more than small correlations between the different Fourier modes which 
can be secur ely i gnored when compared for instances with the potentially much more serious non-homogeneity of the background (HM03) 
(Rocha et al. |2005) ). 

The "confusion" noise as result of the contribution to the background of the faint unresolved point sources: This is the dominant com- 
ponent of the background of the Planck high frequency ( 545 and 857 GHz) maps on high galactic latitudes. This component may be modelled 
as a Poissonian sampling process with a constant rate source. Any realistic background model should take this into account. 



8 RESULTS 

We tested the presented ideas performing an extensive set of simulations using two different toy models and different detection scenarios 
(the latter solely for the white noise only model (see subsection |8.2| (). In both cases we used a Gaussian source template The parameters 
for each source were drawn from uniform distributions with bounds given in table |4] The patches are square grids of 200 x 200 pixels with 
a border of 2x the largest source radius, in pixels. The sources were spread uniformly throughout the patch. Care was taken to avoid source 
clustering and the presence of sources in the patch's borders. We imposed a safe distance between sources by defining a "bounding" square, 
centred on the source with side = 7 x 2 x i? pixels wide, where R = ''Radius of object" (see table|4]for the values of 7). We forbade the 
overlapping of these "bounding" squares. 
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Type of simulation 


LISNR-16-fix 


LISNR-16-vai- 


HMcL 


{ISNR) 


3.84 


3.62 


4.68 


</> 


0.75 


0.66 


0.66 


^ simul 


5000 


1000 


10000 


% detections 


67.41 % 


56.41 % 


82.95 % 


% spurious 


9.60 % 


8.62 % 


8.19% 


Total error 


42.19% 


52.20 % 


25.15 % 



Table 3. Summary of the results from the simulations with a white noise background. A^simul is the number of patches which have been simulated, 4> is the 
tuning par ameter 

8.1 Performance of the algorithms 

We evaluated the performance of the different algorithms by computing the "Total Error" defined as: 

'Total Error = Percentage of 'undetected sources' + Percentage of 'spurious detections' 

where: 

• A 'undetected source' happens when: a simulated object is not recognized by the algorithm as a source. 

• A 'spurious detection' happens when: 

- An object is detected 

- There is no simulated object such that the entire set of its parameters belongs to the volume defined by the estimated parameters of 
the detected object ±3(Tp. 

• The 'Percentage' is computed dividing the quantities by the total number of sources in the patch. 
The algorithm that minimizes the 'Total error is the one that performs better. 

8.2 White noise background 

In Table[3]we show the performance of PowellSnakes, where, (j) is the tuning parameter. The sources are imbedded into a background which 
is a stationary Gaussian random field with a coherence length much smaller than one pixel, usually known as Gaussian white noise process. 
Two scenarios were tested: 

• HMcL (Hobson-McLachlan). With this example we tried to mimic, as exactly as possible, the HM03 "Toy problem 4.3" 

• LISNR (Low Integrated Signal to Noise Ratio). This example was inspired in another from HM03, where a SZ example was given, but 
we have only retained the range of the ISNR involved. We did not try to simulate the SZ case at all as we have always worked with the 
Gaussian source template. 

Two situations were investigated: 

- With a fixed number of sources per patch: "fix" 

- With a variable number of sources per patch: "variable". In this case the number of sources for each patch was drawn from an uniform 
distribution whose minimum limit was zero and the maximum was the value displayed in table|4](A^o6js) 

Table |4] has a summary of the parameters' values, background and foregrounds (sources), for each of the simulated examples. Figure [T] 
represents a typical likelihood manifold for this case (A and R dimensions suppressed). 

Each run (map complete cleanup) takes less than a second (~ 2 each second) on a regular PC (Pentium IV, 2.39 GHz). A considerable 
effort to optimize the code was made. An intelligent management of previously computed values was essential in achieving such an high 
performance. Provisions for parallelizing the code were made but not yet employed. 

8.2.1 Discussion 

One of our major goals when we started this project was making Bayesian methods run fast enough in order to allow us to collect significant 
statistics figures by performing massive simulations in a reasonable amount of time. We have fulfilled our goal, as you may verify in table 
[S] several thousands of simulations were performed in each proposed scenario making our figures statistically sounding. Our results are 
consistent with the theoretical framework presented here. Examples with lower ISNR show higher levels of "Total error" as expect and 
significantly higher than the lower bound (see subsection [63}. Cases with a fixed number of sources per patch always show a lower error. 
This should not be a surprise as in our priors we have assumed a constant expected number of sources per patch and the lower the dispersion 
in the number of sources the better the prior fitted the data, hence the higher performance. Worth of a reference is the "Tuning parameter" 
(subsection 16.4b . In table [3] we are only showing results achieved using the "Tuning parameter" values which performed better. All the 
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Detection models 




LISNR-16-var 


LISNR-16-fix 


Hobson-McLachlan ^ 


Rmax (pixels) 


5.4 


5.4 


10 


Rmin (pixels) 


3.21 


3.21 


5 




0.76 


0.8 


1 




0.2 


0.22 


0.5 


a 


1 


1 


2 


^objs 


16 (variable) 


16 (fixed) 


8 (variable) 


7 


1.5 


1.5 


2.5 



Table 4. Detection models parameters, where the Rmax and Rmin are the maximum and minimum radius respectively; and Amax and Amin are the 
maximum and minimum amplitude respectively; The radii units are pixels and the source amplitudes units are given in the same unit as a 




Figure 11. "Coloured" background - astronomical component; Power spectrum was modelled using formula (86); so = 11.57pixels 



presented examples are low ISNR cases, thus we have only employed values lower than 1 (see table O. These ''best choice' values are 
consistent with our predictions. 



8.3 "Coloured" background 

Our arrangement for the ''coloured" background simulations followed the same guidelines as those for the white noise only case. A greater 
care concerning the realism of the setup was taken into consideration. We used a pixel size of 1' x 1', and small patches of 200 x 200 pixels 
(~ 3.33° X 3.33°). The background consisted of two uncorrelated components: 

• A stationary Gaussian random field with a Gaussian power spectrum. The power spectrum, already smoothed by the antenna, was 
modelled by formula J86t with a coherence length scale parameter of sq — 11.57 pixels (see figure [TTt 

• Pixel noise with a "white" profile with constant level across the map 

• The CMB RMS level used was lOfiK and the pixel noise 20^K which together implied wi = 4/5 and wb = 1/5 (see expression|87| 
figure [T2I) 

The simulated sources had amplitudes ranging from 0.9 to 1.0, in normalized units (normalized by the total rms noise level). The radii of the 
sources were inside the interval [3.2, 5.0] pixels. The whole setup results in an average (ISNR) of ~ 3.84. In each simulated map we put 
16 objects with their parameters drawn from uniform distributions (whose limits are given above) . Following the same prescription as for 
the white noise only case, we prevented the sources from forming clusters and being placed on the patch borders (see figure [T3ll . Table ?? 
contains the results obtained. We have performed a total of 5000 simulations. 



8.3.1 Discussion 

When we introduced correlation in the background, the likelihood manifold has now considerably changed from the white noise only example. 
When keeping {A, R} constant, looking at the position subspace one may see that the likelihood maxima are now extremely narrow and 
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Figure 13. The simulated sources. The shape of the sources was assumed Gaussian ^6); all the parameters, {A, R, X, Y}, were drawn for uniform distributions 
(check the text for details) 




Figure 14. Likelihood ratio j — - (A and R dimensions suppressed) top view; The sources are the same as in figure fTS] the background parameters are those 
from the simulations: sq = 11.57 pixels, wj = 4/5, wg = 1/5, (NISNR) ~ 3.84 
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parameters: sq = 11.57 pixels, wj = 0.01, wb = 0.99, NISNR ~ 23.31 



high and they have become surrounded by subsidiary peaks of considerable height (see figure [2ll. This scenario is even more acute when 
the proportion of instrumental noise becomes lower in relation to the ''coloured' component. In figure l llSt we depict an exaggerated case 
(for clarity) where the instrumental noise is only 10% of the astronomical component. Nevertheless we should stress that cur rent d etection 
instrumentations are already getting very close to these values. The Planck mission LFI instrument (Planck ''Blue book" ( l2005h ) has a 
previewed average ratio of ~ 20% between the astronomical components and the detectors instrumental noise. Though a more balanced 
proportion should be expected on the majority of existing instrumental setups. When dealing with such scenarios the search for the likelihood 
maxima becomes harder. The likelihood peaks being thin and completely surrounded by a very high number of other fake peaks are extremely 
difficult to find. One now needs to unfold a much higher number of "snakes'", typically four times the number of the white noise only case. 
The use of the "jump procedure" is now mandatory in order to pass through the large accumulation of fake peaks which encircles the real 
ones. Since the likelihood peaks are now extremely thin, the error bars on the source localization are now extremely small, usually much less 
than a pixel. This is the equivalent of assuming that the estimated values for the position parameters perfectly match the true ones. When 
we start employing the Bayesian procedure for validation of the detection we should replace the flat position prior (see|52t by a Dirac delta 
centred on the values previously estimated (see ll07t for the position of the source: 



S{x - XQ)S{y - yo) 
= AAAR ^^^'^ 

As a final note on the speed of the code execution we must note the substantial decrease of performance when dealing with the "coloured 
background". The average time for completing a full patch is now about 8 seconds which is about an order of magnitude more than in the 
previous case. Despite this significant reduction, our initial purpose (performing massive amount of simulations) has not been compromised 
as one can still undertake thousands of simulations in a reasonable amount of time. 

A summary of the sequence of steps performed by PowellSnakes, Version I (the version here exposed), is given in Appendix A. 



9 CONCLUSIONS 

The detection and characterisation of discrete objects is a common problem in many areas of astrophysics and cosmology. When performing 
this task most methods assume a smoothly varying background with a characteristic scale or coherence length much larger then the scale of 
the discrete objects under scrutiny. However high-resolution observations of the cosmic microwave background, CMB, pose a challenge to 
such assumptions. CMB radiation has a coherence scale length of the order of ~ 10 arcmin, close to that of the objects of interest such as 
extragalactic 'point' sources or the Sunyaev-Zel'dovitch effect in galaxy clusters. In addition the instrumental noise levels can be greater than 
the amplitude of the discrete objects. The common approach for dealing with such difficulties is to apply a matched filter to the initial map and 
instead analyse the filtered map. These approaches have reported good performances. However the filtering process is only optimal among the 
limited class of linear filters and is dissociated from the subsequent object detection step of selection performed on the filtered maps. Hobson 
& McLachlan (l l2003h : HM03) were the first to introduce a Bayesian approach to the detection and characterization of discrete objects in 
a diffuse background. Its implementation was performed using Monte-Carlo Markov chain (MCMC) sampling from the posterior which 
was very computationally expensive. This prompted us to explore a new, fast method for performing Bayesian object detection in which 
sampling is replaced by multiple local maximisation of the posterior, and the evaluation of errors and Bayesian evidence values is performed 
by making a Gaussian approximation to the posterior at each peak. In this paper we presented such a novel approach, PowellSnakes. We 
start by pre-filtering the map and explicitly show that for a given value of R (radius of the source), peaks in the filtered field correspond to 
peaks in the log-likelihood considered as function of the position of the sources. We replace the standard form of the likelihood for the object 
parameters by an alternative exact form that is much quicker to evaluate. Next we launch A'^ downhill minimizations in the 2-dimensional 
(X,Y) using the Powell algorithm. In order to avoid local maxima in the likelihood surface we devised the so called 'jump procedure', a 
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new concept borrowed from the realm of quantum mechanics. Next we perform multiple Powell minimizations in the full 4-dimensional 
space (X,Y,A,R), starting each minimization from the positions found in the previous step. To evaluate the performance of an algorithm 
we introduced here for the first time the concept of 'symmetric loss' within the context of the proposed subjects. The Peak-based prior, P, 
was proposed and an upper bound on the quality of a detection presented. We showed that the quantity, 'Normalised Integrated Signal to 
Noise Ratio', NISNR , plays an important role when establishing a boundary on the source fluxes that can be reliably detected. We included a 
discussion on the truncation of the posterior by the prior and proposed a novel solution by introducing the 'tuning parameter', 0. We studied 
in detail the consequences of adding correlation into the diffuse background which makes the substructure where the sources are imbedded. 
As the Linear filter works as a whitening equalizer, the coloured noise case reduces to the white noise as long as we use the 'whitened' map 
and sources. Using simulations of several typical scenarios we showed that our approach performs very well on both white and coloured 
background. We found that our results are consistent with our theoretical framework. For the white noise background, cases with a lower, 
'Integrated Signal to Noise Ratio', ISNR, exhibit higher levels of 'Total Error' as expected. Cases with a fixed number of sources per patch 
show a lower error. The number of spurious detections increases with this prior however the number of detected sources increases by a larger 
amount resulting in a lower 'Total error'. When we introduce colour in the background the likelihood changed considerably; the maxima 
in the position subspace are now extremely narrow and high surrounded by subsidiary peaks of considerable height. The search for the 
likelihood maxima becomes harder: one has to resort to a larger number of 'snakes'. The usage of the jump procedure becomes mandatory. 
In this case the errors in the source localization are very small, usually much less than a pixel. This means that we can replace the flat position 
prior by a Dirac delta function centred on the values previously estimated. 

Furthermore this approach yields a speed-up over sampling-based methods of many 100s, making the computational complexity of 
the approach comparable to that of linear filtering methods. Here we presented PowellSnakes, in its first incarnation: 'PowellSnakes I'. An 
account of an updated implementation, 'PowellSnakes IF will be published shortly. The application of the method to realistic simulated 
Planck observations will be presented in a forthcoming publication. 
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10 APPENDIX A 

PowellSnakes is a new fast Bayesian approach for the detection of discrete objects immersed in a diffuse background. This new method 
speeds up traditional Bayesian techniques by: 

• replacing the standard form of the likelihood for the parameters characterizing the discrete objects by an alternative exact form that is 
much quicker to evaluate; 

• using a simultaneous multiple minimization code based on Powell's direction set algorithm to locate rapidly the local maxima in the 
posterior; and 

• deciding whether each located posterior peak corresponds to a real object by performing a Bayesian model selection using an approx- 
imate evidence value based on a local Gaussian approximation to the peak. The construction of this Gaussian approximation also provides 
the covariance matrix of the uncertainties in the derived parameter values for the object in question. 

This new approach provides a speed up in performance by a factor of 'hundreds' as compared to existing Bayesian source extraction 
methods that use MCMC to explore the parameter space, such as that presented by Hobson & McLachlan (200^. The method can be 
implemented in either real and Fourier space. In the case of objects embedded in a homogeneous random field, working in Fourier space 
provides a further speed up that takes advantage of the fact that the correlation matrix of the background is circulant. Its performance is found 
to be comparable if not better to that of frequentist techniques such as applying optimal and wavelet filters . Furthermore PowellSnakes 
has the advantage of consistently defining the threshold for acceptance/rejection based on priors, the same cannot be said of the frequentist 
methods. 

Let's start by recapitulating that the Powell's method is the prototype of "direction-set methods" for maximization or minimization of 
functions. These are the methods to apply when you cannot easily compute derivatives. However this method requires a one-dimensional 
minimization sub-algorithm, eg. Brent's method. The Brent's method is an interpolation scheme whereby you alternate between a parabolic 
step and golden sec tions. It is used in one-dimensional minimization problems without calculation of the derivative, (for more details see 
Numerical Recipes |l992|), pg 389, 395, 406). 
The steps followed in PowellSnakes, Version I, are: 

(i) Compute the classical linear matched filter A = J^i-^^-j for whole patch in the Fourier domain - the denominator is constant 
and needs to be evaluated only once which can be done in Fourier space as well 

(ii) Compute all the constants 

(iii) Fourier transform the map - a 4Mpix map takes at most 3 sees (FFTW) 

(iv) Filter the map (~ 2 sees) 

(v) transform back to the map space using the inverse fourier transform (~ 3 sees) 
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(vi) Evaluate the ln{ p[^^[jj ) map (~ 2 sees) 

(vii) Find the peaks of ln{ p^^J ) field - check if they are higher than a certain acceptance level: find the peaks using a Powell minimizer 
with jumps. The evaluation of the objective function is fast - use a kind of evaluation cache (pick a value from a precalculated array). 
However we still need to find the optimal radius, positions and amplitude in real space. When using the matched filter an average value for 
the radius of the source has been used and the values of the peaks are only approximate. Now we need to find the optimal parameters. 

(viii) Error bars are calculated from approximate versions of the analytic formulae (while MCMC provides these errors automatically) 

(ix) The stop criteria - no stop criteria - however there is a way of ensuring that no peak is le ft behind - we proceed as follows: first of 
all we can have an idea of the area (not the volume) under the likelihood peaks. Let's consider now a imaginary pixel of the size of this area. 
Split the map into this imaginary pixels and start a PowellSnakes search in the center of each one of them. It is unlikely to miss any of the 
significant likelihood maxima (specially if we use the jump technique). One might think this process to be time consuming but in fact this is 
not the case since the objective function is pre-calculated. 



10.1 Characteristics 

Let's now characterize PowellSnakes in terms of its speed and sensitivity. 



10.1.1 Speed 

The PowellSnakes algorithm is a two step process, in the first part of the algorithm we proceed as follows: 

(i) First the map is filtered with a linear matched filter given by: 

- _ t^(X,i?)N-M 

^^^'^)-tT(X,J?)N-it(X,i?)- 

The evaluation is done in Fourier space using the average radius of the possible range of radiae. The map is Fourier transformed again back 
to real space. If the radiae of the objects were all equal, and if we knew their true value, the resulted filtered field would be proportional to 
the likelihood field with the absolute maxima on top of the objects. When the available range of radiae is small, filtering with the 'average' 
matched filter will not produce absolute maxima but in the great majority of the situations a local peak is created over the objects. 

(ii) Next we search this map with the Powell minimizer in two dimensions (the position) with the jumps switched on. This is an extremely 
fast operation because the objective function is already pre-calculated, and we just need to pick up a value from an array, or better, compute 
a bilinear interpolation. 

In the second part of the algorithm: 

(i) Now we already have a good idea of where to look further: Around the maxima of the likelihood restricted to the subspace of the 
position of the objects. In step I) we produced a 'whitened' version of the map, and we 'whitened' the test objects, at the same time. At this 
point we switch off the jumps, and launch 'PowellSnakes' in all four dimensions (position, radius and amplitude), taking as starting points 
the previously found maxima restricted to the position subspace. Use the complete expression for the likelihood (no analytical shortcuts). 

(ii) Once Powell minimizer finishes, pick up the optimal values for the parameters and test them for acceptance/rejection. If and only if the 
maximum is accepted as 'good', subtract the object from the map. (This is different from the our old solution: sometimes, when we subtracted 
a rejected peak in the neighbourhood of a true but still undetected object, we ended up damaging the good peak. Unfortunately most of the 
times that is enough to prevent the real object to be detected. Right now we only subtract accepted peaks to avoid multiple detections of the 
same object. We think we should try to avoid the subtraction at all, since there is always the risk that a spurious detection and consequent 
subtraction will damage a good peak.) 



10.1.2 Sensitivity 

Now, let's consider the Sensitivity: 

(i) Analytical solutions are in general great, however when noise is taken into consideration the functions become hard to handle. Further- 
more most of the theorems don't apply in this case. Therefore one should try to avoid analytical solutions. 

(ii) Sometimes the estimated values of the parameters fall off the allowed range used to define the Bayesian criterion for acceptance. 
Therefore even if the error bars allow them, we don't accept them instead send them for testing (accept/reject), or reject them immediately. 
Rather you should attach them a very low probability and let the computation continue. 

(iii) PowellSnakes depends on several parameters in order to optimise its performance / sensitivity: ie, the number of launched "snakes", 
average number of sources per patch, etc. Since the detection conditions (background, noise, signal-to-noise ratio, etc.) can exhibit some 
drastically changes across the sphere, these parameters should be occasionally re-adjusted. In its current implementation PowellSnakes 
allows the definition of sphere zones. Within each zone a different set of these parameters may be loaded from the parameter file in order to 
tune its sensitivity / performance. 

(iv) Do not expect miracles: If the NISNR is low you cannot detect reliably. 
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10.1.3 Other technical details 

(i) The structure of the program was designed to be ready to take full advantage of a multiprocessor system. The performance will scale 
linearly with the number of processors. 

(ii) Implemented in ISO C++. 

(iii) Completely built on templates. This feature allows you to consider the precision you want. 

(iv) Designed as a 'framework', ie you do not need to understand how PowellSnakes works to use it. Inherit from its classes, customize 
it, and PowellSnakes will call your code when needed. 
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